Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illnesses.The best way to prevent and slow down transmission is to be well informed about the COVID-19 virus, the disease it causes, and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol-based rub frequently and not touching your face. The COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes.
The purpose of this research project is to summarize global data and provide detailed reports and to provide information for the planning and evaluation of health services.
#library
import pandas as pd # data processing
import plotly.express as px # visualization library
data = pd.read_csv("covid_19_clean_complete.csv")
data = data.fillna("")
#check top 10 rows of the dataset
data.head(10)
| Province/State | Country/Region | Lat | Long | Date | Confirmed | Deaths | Recovered | Active | WHO Region | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 33.93911 | 67.709953 | 2020-01-22 | 0 | 0 | 0 | 0 | Eastern Mediterranean | |
| 1 | Albania | 41.15330 | 20.168300 | 2020-01-22 | 0 | 0 | 0 | 0 | Europe | |
| 2 | Algeria | 28.03390 | 1.659600 | 2020-01-22 | 0 | 0 | 0 | 0 | Africa | |
| 3 | Andorra | 42.50630 | 1.521800 | 2020-01-22 | 0 | 0 | 0 | 0 | Europe | |
| 4 | Angola | -11.20270 | 17.873900 | 2020-01-22 | 0 | 0 | 0 | 0 | Africa | |
| 5 | Antigua and Barbuda | 17.06080 | -61.796400 | 2020-01-22 | 0 | 0 | 0 | 0 | Americas | |
| 6 | Argentina | -38.41610 | -63.616700 | 2020-01-22 | 0 | 0 | 0 | 0 | Americas | |
| 7 | Armenia | 40.06910 | 45.038200 | 2020-01-22 | 0 | 0 | 0 | 0 | Europe | |
| 8 | Australian Capital Territory | Australia | -35.47350 | 149.012400 | 2020-01-22 | 0 | 0 | 0 | 0 | Western Pacific |
| 9 | New South Wales | Australia | -33.86880 | 151.209300 | 2020-01-22 | 0 | 0 | 0 | 0 | Western Pacific |
In the Covid 19 dataset, we can analyze the data of all confirmed cases, death cases, active cases, and recovered cases in the world. In this way, after we conduct research, we can provide a detailed report, and the government can refer to this research. Why did I choose this Covid 19 dataset instead of crime dataset ? This is because Covid 19 dataset can help the government calculate the status of the disease and formulate its medical system, such as lockdown the country. The weakness of this research is the inability to obtain the latest data and information.
Province/State: This column provides the state name.
Country/Region: This column provides the country name.
Lat: This column provides the latitude of the country.
Long: This column provides the longitude of the country.
Date: This column provides dates from January 2020 to July 2020.
Confirmed: This column provides the total number of confirmed cases.
Deaths: This column provides the total number of deaths cases.
Recoveded: This column provides the total number of recoveded cases.
Active: This column provides the total number of active cases.
WHO Region: This column provides the WHO region name.
#information about the dataset
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 49068 entries, 0 to 49067 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Province/State 49068 non-null object 1 Country/Region 49068 non-null object 2 Lat 49068 non-null float64 3 Long 49068 non-null float64 4 Date 49068 non-null object 5 Confirmed 49068 non-null int64 6 Deaths 49068 non-null int64 7 Recovered 49068 non-null int64 8 Active 49068 non-null int64 9 WHO Region 49068 non-null object dtypes: float64(2), int64(4), object(4) memory usage: 3.7+ MB
There have 49068 rows and 10 columns in the dataset.
#more technical overview
data.describe()
| Lat | Long | Confirmed | Deaths | Recovered | Active | |
|---|---|---|---|---|---|---|
| count | 49068.000000 | 49068.000000 | 4.906800e+04 | 49068.000000 | 4.906800e+04 | 4.906800e+04 |
| mean | 21.433730 | 23.528236 | 1.688490e+04 | 884.179160 | 7.915713e+03 | 8.085012e+03 |
| std | 24.950320 | 70.442740 | 1.273002e+05 | 6313.584411 | 5.480092e+04 | 7.625890e+04 |
| min | -51.796300 | -135.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 | -1.400000e+01 |
| 25% | 7.873054 | -15.310100 | 4.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000e+00 |
| 50% | 23.634500 | 21.745300 | 1.680000e+02 | 2.000000 | 2.900000e+01 | 2.600000e+01 |
| 75% | 41.204380 | 80.771797 | 1.518250e+03 | 30.000000 | 6.660000e+02 | 6.060000e+02 |
| max | 71.706900 | 178.065000 | 4.290259e+06 | 148011.000000 | 1.846641e+06 | 2.816444e+06 |
#rename the column name
data.rename(columns = {"Province/State":"state","Country/Region":"country"}, inplace = True)
confirmed = data.drop_duplicates(subset = ["country"], keep = "last")
fig = px.choropleth(confirmed, locations = "country", locationmode = "country names",
color = "Confirmed", hover_name = "Confirmed",
color_continuous_scale = "portland",
template = "plotly_dark",
title = "World map of Confirmed Cases")
fig.show()
As can be seen from the world map above, The United States is the country with the most confirmed cases, The total number of confirmed cases is 4,290,259.
deaths = data.drop_duplicates(subset = ["country"], keep = "last")
fig = px.choropleth(deaths, locations = "country", locationmode = "country names",
color = "Deaths", hover_name = "Deaths",
color_continuous_scale = "portland",
template = "plotly_dark",
title = "World map of Death Cases")
fig.show()
As can be seen from the world map above, The United States is the country with the most death cases, The total number of death cases is 148,011.
con = data.drop_duplicates(subset = "country", keep = "last").sort_values(by = "Confirmed", ascending = False).head(20)[::-1]
fig = px.bar(con, y = "country", x = "Confirmed",
template = "plotly_dark", height = 800,
color_discrete_sequence = ["lightblue"], text = "Confirmed",
title = "Top 20 countries of highest Confirmed Cases in the world")
fig.show()
As can be seen from the bar chart above, the United States is the top 1 with the most confirmed cases in the world, Brazil is the top 2 and India is the top 3.
death = data.drop_duplicates(subset = "country", keep = "last").sort_values(by = "Deaths", ascending = False).head(20)[::-1]
fig = px.bar(death, y = "country", x = "Deaths",
template = "plotly_dark", height = 800,
color_discrete_sequence = ["red"], text = "Deaths",
title = "Top 20 countries of highest Death Cases in the world")
fig.show()
As can be seen from the bar chart above, the United States is the top 1 with the most death cases in the world, Brazil is the top 2 and Mexico is the top 3.
active = data.drop_duplicates(subset = "country", keep = "last").sort_values(by = "Active", ascending = False).head(20)[::-1]
fig = px.bar(active, y = "country", x = "Active",
template = "plotly_dark", height = 800,
color_discrete_sequence = ["green"], text = "Active",
title = "Top 20 countries of highest Active Cases in the world")
fig.show()
As can be seen from the bar chart above, the United States is the top 1 with the most active cases in the world, Brazil is the top 2 and India is the top 3.
recovered = data.drop_duplicates(subset = "country", keep = "last").sort_values(by = "Recovered", ascending = False).head(20)[::-1]
fig = px.bar(recovered, y = "country", x = "Recovered",
template = "plotly_dark", height = 800,
color_discrete_sequence = ["purple"], text = "Recovered",
title = "Top 20 countries of highest Recovered Cases in the world")
fig.show()
As can be seen from the bar chart above, the Brazil is the top 1 with the most recovered cases in the world, US is the top 2 and India is the top 3.
pConfirm = data.pivot_table(index = "Date", values = ["Confirmed"], aggfunc = "sum").reset_index()
fig = px.line(pConfirm, x = "Date", y = "Confirmed", template = "plotly_dark",
title = "Global Confirmed Cases Over Time ",
color_discrete_sequence = ["lightblue"],hover_name = "Confirmed")
fig.show()
As can be seen from the chart above, there has been a clear upward trend since April and reaching 16.48049 million confirmed cases in July.
pDeath = data.pivot_table(index = "Date", values = ["Deaths"], aggfunc = "sum").reset_index()
fig = px.line(pDeath, x = "Date", y = "Deaths", template = "plotly_dark",
title = "Global Death Cases Over Time ",
color_discrete_sequence = ["red"],hover_name = "Deaths")
fig.show()
As can be seen from the chart above, there has been a clear upward trend since April and reaching 654,036 death cases in July.
pActive = data.pivot_table(index = "Date", values = ["Active"], aggfunc = "sum").reset_index()
fig = px.line(pActive, x = "Date", y = "Active", template = "plotly_dark",
title = "Global Active Cases Over Time ",
color_discrete_sequence = ["green"],hover_name = "Active")
fig.show()
As can be seen from the chart above, there has been a clear upward trend since April and reaching 6,358,362 active cases in July.
pRecovered = data.pivot_table(index = "Date", values = ["Recovered"], aggfunc = "sum").reset_index()
fig = px.line(pRecovered, x = "Date", y = "Recovered", template = "plotly_dark",
title = "Global Recovered Cases Over Time ",
color_discrete_sequence = ["purple"],hover_name = "Recovered")
fig.show()
As can be seen from the chart above, there has been a clear upward trend since April and reaching 9,468,087 recovered cases in July.
#create a list
asia_list=list(['China','India','Indonesia','Pakistan','Bangladesh','Japan','Philippines','Vietnam','Turkey','Iran','Thailand','Myanmar','South Korea','Iraq',
'Afghanistan', 'Saudi Arabia','Uzbekistan','Malaysia','Yemen','Nepal','North Korea','Sri Lanka','Kazakhstan','Syria','Cambodia','Jordan','Azerbaijan',
'United Arab Emirates', 'Tajikistan', 'Israel', 'Laos', 'Lebanon', 'Kyrgyzstan', 'Turkmenistan', 'Singapore', 'Oman',
'State of Palestine', 'Kuwait', 'Georgia', 'Mongolia', 'Armenia', 'Qatar','Bahrain','Timor-Leste','Cyprus','Bhutan','Maldives','Brunei'])
asia=data[data["country"].isin(asia_list)]
asiaD = asia.drop_duplicates(subset = ["country"], keep = "last")
asiaCon = asiaD.sort_values(by = "Confirmed", ascending = False).head(10)[::-1]
fig = px.bar(asiaCon, y = "country", x = "Confirmed", template = "plotly_dark",
title = "Top 10 countries of highest Confirmed cases in Asia", color_discrete_sequence = ["lightblue"],
height = 700, text = "Confirmed")
fig.show()
As can be seen from the bar chart above, the India is the top 1 with the most confirmed cases in the Asia, Iran is the top 2 and Pakistan is the top 3.
asiaDea = asiaD.sort_values(by = "Deaths", ascending = False).head(10)[::-1]
fig = px.bar(asiaDea, y = "country", x = "Deaths", template = "plotly_dark",
title = "Top 10 countries of highest Deaths cases in Asia", color_discrete_sequence = ["red"],
height = 700, text = "Deaths")
fig.show()
As can be seen from the bar chart above, the India is the top 1 with the most death cases in the Asia, Iran is the top 2 and Pakistan is the top 3.
asiaAct = asiaD.sort_values(by = "Active", ascending = False).head(10)[::-1]
fig = px.bar(asiaAct, y = "country", x = "Active", template = "plotly_dark",
title = "Top 10 countries of highest Active cases in Asia", color_discrete_sequence = ["green"],
height = 700, text = "Active")
fig.show()
As can be seen from the bar chart above, the India is the top 1 with the most active cases in the Asia, Bangladesh is the top 2 and Philippines is the top 3.
asiaRec = asiaD.sort_values(by = "Recovered", ascending = False).head(10)[::-1]
fig = px.bar(asiaRec, y = "country", x = "Recovered", template = "plotly_dark",
title = "Top 10 countries of highest Recovered cases in Asia", color_discrete_sequence = ["purple"],
height = 700, text = "Recovered")
fig.show()
As can be seen from the bar chart above, the India is the top 1 with the most recovered cases in the Asia, Iran is the top 2 and Pakistan is the top 3.
#create a list
euro_list = list(['Austria','Belgium','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Germany','Greece','Hungary','Ireland',
'Italy', 'Latvia','Luxembourg','Lithuania','Malta','Norway','Netherlands','Poland','Portugal','Romania','Slovakia','Slovenia',
'Spain', 'Sweden', 'United Kingdom', 'Iceland', 'Russia', 'Switzerland', 'Serbia', 'Ukraine', 'Belarus',
'Albania', 'Bosnia and Herzegovina', 'Kosovo', 'Moldova', 'Montenegro', 'North Macedonia'])
euro = data[data["country"].isin(euro_list)]
eD = euro.drop_duplicates(subset = ["country"], keep = "last")
euroCon = eD.sort_values(by = "Confirmed", ascending = False).head(10)[::-1]
fig = px.bar(euroCon, y = "country", x = "Confirmed", template = "plotly_dark",
title = "Top 10 countries of highest Confirmed cases in Europe", color_discrete_sequence = ["lightblue"],
height = 700, text = "Confirmed")
fig.show()
As can be seen from the bar chart above, the Russia is the top 1 with the most confirmed cases in the Europe, Spain is the top 2 and Italy is the top 3.
euroDea = eD.sort_values(by = "Deaths", ascending = False).head(10)[::-1]
fig = px.bar(euroDea, y = "country", x = "Deaths", template = "plotly_dark",
title = "Top 10 countries of highest Deaths cases in Europe", color_discrete_sequence = ["red"],
height = 700, text = "Deaths")
fig.show()
As can be seen from the bar chart above, the Italy is the top 1 with the most death cases in the Europe, Spain is the top 2 and Russia is the top 3.
euroActive = eD.sort_values(by = "Active", ascending = False).head(10)[::-1]
fig = px.bar(euroActive, y = "country", x = "Active", template = "plotly_dark",
title = "Top 10 countries of highest Active cases in Europe", color_discrete_sequence = ["green"],
height = 700, text = "Active")
fig.show()
As can be seen from the bar chart above, the Russia is the top 1 with the most active cases in the Europe, Spain is the top 2 and Sweden is the top 3.
euroRec = eD.sort_values(by = "Recovered", ascending = False).head(10)[::-1]
fig = px.bar(euroRec, y = "country", x = "Recovered", template = "plotly_dark",
title = "Top 10 countries of highest Recovered cases in Europe", color_discrete_sequence = ["purple"],
height = 700, text = "Recovered")
fig.show()
As can be seen from the bar chart above, the Russia is the top 1 with the most recovered cases in the Europe, Italy is the top 2 and Germany is the top 3.